Lexicon Acquisition with a large-coverage unification-based grammar
نویسنده
چکیده
We describe how unknown lexical entries are processed in a unification-based framework with large-coverage grammars and how from their usage lexical entries are extracted. To keep the time and space usage during parsing within bounds, information from external sources like Part of Speech (PoS) taggers and morphological analysers is taken into account when information is constructed for unknown words.
منابع مشابه
Integrating Probabilistic and Knowledge-based Approaches to Corpus Parsing
We have developed a prototype system for syntactic parsing of corpus text based on a wide-coverage unification-based grammar of English and domain-independent statistical techniques for selecting the most plausible parses from the typically large number licensed by the grammar. Although the results from initial experiments are promising, the system is ‘brittle’, relying particularly on the corr...
متن کاملLexicon Acquisition with and for Symbolic NLP-Systems – a Bootstrapping Approach
We present a method of applying a broad-coverage LFG grammar of German in the process of semi-automatic lexicon acquisition from corpora. The identification of corpus instances that illustrate a certain subcategorization frame uniquely is done by a comparison of the numbers of analyses the grammar assigns to the corpus instances, under the assumption of different hypothetical lexicon entries fo...
متن کاملThe Automatic Acquisition of Verb Subcategorisations and Their Impact on the Performance of an HPSG Parser
We describe the automatic acquisition of a lexicon of verb subcategorisations from a domain-specific corpus, and an evaluation of the impact this lexicon has on the performance of a “deep”, HPSG parser of English. We conducted two experiments to determine whether the empirically extracted verb stems would enhance the lexical coverage of the grammar and to see whether the automatically extracted...
متن کاملAutomatically Extending the Lexicon for Parsing
This paper describes a method for automatically extending the lexicon of wide-coverage parsers. The method is an extension to the automatic detection of coverage problems of natural language parsers, based on large amounts of raw text (van Noord 2004). The goal is to extend grammar coverage, focusing in particular on the acquisition of lexical information for missing and incomplete lexicon entr...
متن کاملD-PATR: A Development Environment for Unification-Based Grammars
I)-PATR is a development environment for unification-based grammars on Xerox l i00 series work stations. It is based on the PATR formalism developed at SRI International. This formalism is suitable for encoding a wide variety of grammars. At one end of this range are simple phrase-structure grammars with no feature augmentations. The PATR formalism can also be used to encode grammars that are b...
متن کامل